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Nonparametrlc  Statistical  Data  Modeling* 
by 

Emanuel  Parzen 


1.  Introduction 

"To  unlock  the  analysis  of  a body  of  data,  to  find  the  good  way  or 
ways  to  approach  it,  may  require  a key  whose  finding  is  a creative  act." 
writes  John  Tukey  (1977)  In  the  Preface  to  his  book  Exploratory  Data 
Analysis.  It  is  the  aim  of  this  paper  to  Introduce  new  types  of  keys  for 
exploratory  data  analysis  (of  continuous  data)  based  on  estimating  the 
quantile  function  and  density  quantile  function.  It  appears  that  this 
approach  leads  to  an  exploratory  data  analysis  which  has  a firm  probability 
base.  Consequently  the  distinction  between  exploratory  and  confirmatory 
data  analysis  can  be  regarded  as  a distinction  between  confirmatory  non-para- 
metrlc  statistical  data  analysis  or  modeling,  and  confirmatory  parametric 
statistical  data  analysis. 

Quantile,  quantile-density,  density-quantile,  and  score  functions  are 
defined  In  Section  2,  and  their  fundamental  inter-relations  are  discussed. 
Transformations  to  observed  data  which  have  specified  distributions  are 
studied  In  Section  3,  and  formulas  are  given  for  their  derivatives.  Auto- 
regressive representations  of  denslty-quantlle  functions  are  Introduced  in 
Section  4.  Sample  quantile  functions  and  their  linear  functionals  are 
defined  in  Section  5.  Goodness  of  Fit  Tests  for  location  and  scale  parameter 
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models  are  Introduced  In  Section  6.  Estimators  of  denslty-quantlle  functions 


are  discussed  In  Section  7.  Section  8 considers  two  examples  — Rayleigh 


data  and  Buffalo  snowfall.  Section  9 discusses  theoretical  examples  of 


denslty-quantlle  functions,  and  their  classification  according  to  tall  behavior. 


Location  and  scale  parameter  estimation  Is  discussed  In  Section  10.  Section  11 


lists  some  open  research  problems  for  extensions. 


L 
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2.  Quantile  functions  and  density-quant lie  functions 

The  distribution  function  (d.f.)  of  a random  variable  X Is  widely 
denoted  F(x)  ~ PCX  ^ x]  . The  random  variable  X Is  said  to  be  continuous 
(more  precisely,  absolutely  continuous)  when  F has  a probability  density 
function  (p.d.f.)  f(x)  ■ F' (x)  in  terms  of  which 

F(x)  - I f(y)  dy  . 

Statistical  inference  has  as  one  of  Its  major  alms  the  estimation  (by 
estimators  which  are  efficient" or  Bayesian,  etc.)  of  F(x)  and  f(x)  from 
data  X^,...,X^  assumed  to  be  a random  sample  of  X (that  Is,  Independent 
random  variables  Identically  distributed  as  X , denoted  l.l.d.). 

Parametric  statistical  Inference  assumes  a representation  for  F(x) 
and  f(x)  as  functions  of  a finite  number  of  parameters,  and  the  estimation 
problem  Is  posed  as  one  of  estimating  these  parameters.  An  Important  para- 
metrlzatlon,  called  the  location  and  scale  parameter  model,  assumes  a 
representation 

f*-)  - • 

f<«)  - } 

where  Fq  Is  a specified  d.f.  and  y and  a are  parameters  to  be  estimated 
(called  location  and  scale  parameters  respectively). 


The  procedures  for  non-parametrlcally  estimating  f(x)  , and  for 
estimating  y and  o , to  be  Introduced  In  this  paper,  begin  by  estimating 
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functions  with  the  following  Definitions; 

Quantile  function  (q.f.) 

Quant lle-denslty  function  (q.d.f.) 

Denslty-quantlle  function  (d.q.f.) 

Score  function  (sc.f.) 

These  functions  arise  constantly  In  non-parametric  statistics,  but  they 
do  not  seem  to  be  usually  given  names,  or  have  a universally  accepted  nota- 
tion, or  be  systematically  tabulated  or  discussed  (see  Hajek  and  Sldak  (1967)). 

It  Is  customary  mathematical  notation  to  denote  a composite  function 
such  as  f^Q(u)^  by  fQ(u)  ; we  pronounce  it  the  "eff-cue"  function. 

For  a general  distribution  function  F(>)  which  Is  only  assumed  to 
be  continuous  from  the  right  one  defines 

Q(u)  ■ F ^(u)  ■ inf  {x  : F(x)  ^ u}  . 

Properties  of  Q and  F can  be  deduced  from  each  other,  using  the  follow- 
ing fundamental  Theorem;  for  all  x in  -<»  < x < «>  and  all  u In 

0 < u < 1 

F(x)  i u If,  and  only  If,  Q(u)  i x . 


Q(u)  ■ F ^(u)  , 0 2 u 2 1 
q(u)  ■ Q' (u)  , 0 < u < 1 

fQ(u)  - f(Q(u))  , 0 s u s 1 

J(u)  - -(fQ)'(u)  , 0 < u < 1 . 


(for  a proof  see  Roussas  (1973),  p.  186  where  In  addition  it  is  shown  that 
FQ(u)  2 u for  any  distribution  function  F ) . 
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Theorem.  When  F is  continuous,  Q satisfies 

FQ(u)  «=  u . 

When  F Is  continuous  and  strictly  Increasing  there  Is  exactly  one  x 
such  that  F(x)  * u ; then  Q(u)  equals  this  value  of  x and 


QF(x) 


Differentiating  FQ(u)  ■ u , we  obtain  (by  the  rules  for  differentiating 
composite  functions)  the  Reciprocal.  Theorem; 

FQ(u)  q(u)  « 1 . 

In  words,  fQ  and  q are  reciprocals  of  each  other  (which  Justifies  calling 
them  by  names  which  are  the  reverses  of  each  other) . 

The  q.d.  function  q(u)  thus  plays  a pivotal  role.  From  a knowledge 
(or  estimator)  of  q one  obtains  both  Q(u)  and  fQ(u)  by  the  formulas 


U 

Q(u)  - Q(Uq)  - / q(t)  dt  , 


fQ(u)  - 


q(u) 


By  the  rules  for  differentiation  of  composite  functions 


(fQ)'(u)  - f’Q(u)  q(u) 
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so  the  score  function  satisfies 


f'Q(u) 
■ f Q(u) 


f*(F~^(u)) 
f (F"l(u)) 


vhlch  Is  the  customary  definition  of  the  score  function  in  the  literature 
of  non-parametrlc  statistics.  For  purposes  of  estimation  of  the  score  func- 
tion In  small  samples,  the  usual  definition  requires  one  to  first  estimate 
f’  , f , and  F ^ ; our  definition  requires  one  only  to  estimate  (fQ)'  which 
we  will  he  able  to  do  by  a polynomial  in  . 

Many  formulas  of  statistical  theory  become  unified  when  expressed  In 
terms  of  quantile  functions,  density  quantile  functions,  and  score  functions. 
The  different  kinds  of  tall  behavior  of  distributions  clearly  correspond  to 
the  behavior  of  Q(u)  as  u tends  to  1 or  0 . The  formula  defining  the 
Pearson  family  of  frequency  curves,  which  is  of  the  form  (see  Elderton  and 


Johnson  (1969)) 


zil.OO.. 

f(x) 


^0  ^1  ^ 
bjj  + bj^  X + b2 


can  be  rewritten,  by  letting  x “ Q(u)  , as  a relation  between  J(u)  and 
Q(u)  : 


J(u)  - 


®0  ®1 

bo  + b^Q(u)  + b2Q^(u) 


Expectations  can  be  expressed  In  terms  of  quantile  functions.  For  any 
function  g for  which  the  Integrals  are  finite  we  have  the  Theorem; 
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ECg(X)]  - / g(x)  f(x)  dx  - / gQ(u)  du 
-00  0 


To  prove  this  formula,  make  the  change  of  variables  x = Q(u)  , u ■ F(x)  , 
du  ■ f(x)  dx  . In  particular,  moments  are  given  by 


U - E(X)  - / Q(u)  du  , 
0 

2 2 
E(X  ) - / Q^(u)  du 

0 


■ Var  (X)  “ / |Q(u)  - p|  du  . 

0 


We  obtain  conditions  for  the  Integrablllty  of  fQ(u)  and  log  fQ(u) 
from  the  Theorem: 


1 00 
/ g^fQ(u)^  du  = / g^f(x)^  f(x)  dx 


whence 


/ fQ(u)  du  * / f (x)  dx 
0 -«> 


/ log  fQ(u)  du  ■ / f(x)  log  f(x)  dx 
0 


The  right  hand  Integrals  are  familiar  In  statistical  theory,  and  we  believe 
It  Is  because  they  are  evaluations  of  the  Integrals  of  iQ  and  log  fQ  . 

The  reader  Interested  In  examples  of  fQ  functions  should  see  Section  9. 


3.  Transformations 


A basic  technique  of  statistical  data  analysis,  and  also  of  statistical 
distribution  theory,  is  to  transform  a continuous  random  variable  X to 
a continuous  random  variable  Y = g(X)  where  g Is  an  Increasing  continu- 
ous function.  To  express  the  distribution  function  of  Y in  terms  of 

the  distribution  function  of  X we  have  the  Theorem: 

Y - g(X)  implies  F^(y)  = 


However  the  quantile  functions  are  more  explicitly  relatedj  under  the 
assumption  that  F^^  is  a strictly  Increasing  continuous  distribution 
function,  we  have  the  Theorem: 

Y “ g(X)  implies  Q.^(u)  * g^Q^(u)^  (1) 


which  can  be  deduced  from  the  fact  that 

Fy(y)  ^ u iff  ^ g”^(y)  ^ ^ ^ 8Qx(“)  • 

Two  Important  Corollaries  are:  (1)  Y * v + a X,  where  o > 0 , has 
quantile  function  Qy(u)  * y + OQ^(u)  ; (11)  for  X positive,  Y •=  log  X 

has 


quantile  function 
density-quantile  function 
score  function 


Qy(u)  - log  Qjj(u) 


1 . 


Since  a scale  parameter  can  be  converted  to  a location  parameter  by  taking 
logarithms,  it  Is  not  surprising  that  the  function  Q^(u)  ~ ^ arises 

often  in  the  study  of  location  and  scale  parameters. 
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One  should  keep  handy  a table  of  quantile  functions  of  familiar  proba- 
bility laws.  If  the  quantile  function  of  X can  be  transformed  to  the 
quantile  function  of  Y by  an  Increasing  continuous  transformation  g , 
then  to  transform  X to  data  Identically  distributed  as  Y , form  g(X)  . 

By  perusing  a table  of  quantile  functions  one  Immediately  obtains  the  follow- 
ing Theorems  (where  Qq(u)  appears  In  parentheses): 


(1)  log  X Is  extreme  value  distributed  (log  log  yz — ^ 

X *7  U 


If  X Is  exponential  (log  y ) or  Welbull  1 T-li  ^ ) ’ 

(11)  If  log  X Is  exponential,  or  log  log  X Is  extreme 
value,  then  X Is  Pareto  ^{l  - u}  . 

The  probability  density  functions  of  these  distributions  Is  recalled 
In  Section  9. 


To  simulate  a continuous  random  variable  X , one  starts  with  U which 
is  uniformly  distributed  on  0 to  1 and  seeks  an  increasing  function 
such  that  'i'j^(U)  and  X are  identically  distributed;  (1)  implies  that 
’Fj^(u)  * well  known. 


Our  aim  in  this  paper  is  to  show  how  to  estimate  from  data  increasing 
functions  f and  4'^^  , such  that 

;i 


'f'X)  ~ Y , l'j^(Y)  ~ X 


where 


means  identically  distributed  as. 
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When  an  observed  random  variable  X Is  not  normal  (or  exponential) 
one  seeks  to  find  a transformation  of  data  which  Is  normal  (or  exponential). 
The  cumulative  hazard  function  H(x)  In  reliability  theory,  defined  by 


H(x)  “ -log  ^1  - » 


has  the  property  that  H(X)  Is  exponential  with  mean  1 . Thus  estimating 
H(x)  can  be  regarded  as  actually  estimating  a transformation  to  exponentlallty. 

We  are  thus  led  to  consider  the  problem  of  estimating  the  transformation 
¥ such  that  'f(X)  has  a prescribed  distribution  function  ; further, 
let  be  the  transformation  such  that  '{'^^(Y)  ~ X where  Y Is  a random 
variable  with  d.f.  Fq  . Using  suitable  axioms  that  f and  be 

monotone  functions,  one  could  prove 


'J'(x)  = QqF(x),  4'j^(y)  « QFQ(y) 


where  F and  Q denote  the  d.f.  and  q.f.  of  X . We  define  these  to  be 
the  transformations  desired  since  clearly 


QqF(X)  ~ Y , QFq(Y)  ~ X 


To  find  'f  and  we  will  find  their  derivatives 


<l»(x)  - 4''(x)  , 4'i(y)  - ’i'i(y) 
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The  definitions  4'  ■ Q^F  and  4*^  > qF^  Imply  the  Theorem; 

nx)  - qo(F(x))  f(x) 

Now  let  X ■ Q(u)  and  y - Qq(u)  ; we  obtain  the  Theorem; 

♦ Q(u)  - ^0^“^  fQCu)  . 

4»iQo(‘>)  • q(“)  ' 

One  immediate  conclusion  Is  that  and  <1^2%  reciprocal  functions, 

so  estimating  one  immediately  yields  the  ocher. 

A second  conclusion  Is  that  estimating  if>Q  and  estimating  fQ  are 
equivalent  problems  since  ^qQq  is  s known  function. 

The  function  which  turns  out  to  be  natural  to  estimate  Is  denoted 
d(u)  , to  indicate  that  it  Is  a density,  with  the  Definition; 

d(u)  - ’ 

ifhere  Oq  is  a normalizing  constant  with  the  Definition; 

1 

0 w u 

Conditions  for  to  be  finite  are  easily  obtained  from  our  general  classi- 

faction  of  fQ  functions,  can  be  regarded  as  a scale  parameter;  its 

relationship  to  other  measures  of  scale  will  be  derived  from  the  Theorem 
(which  follows  by  integration  by  parts) ; 
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assuming  fQQQ(u)  Q(u)  ■ 0 for  u ■ 0,1  . 

We  find  it  convenient  to  introduce  the  following  terminology  and 
Definitions;  d(u)  is  the  f^Qp  - transformation  density  of  X , 


D(u)  - / d(t)  dt  , 0 s u < 1 , 

0 


is  the  ^qQq  ~ transformation  distribution  function  of  X , and 


♦ (v)  - / d(u)  du  , V - 0,  ± 1,... 

0 


is  the  ^qQq  ~ transformation  correlation  function  of  X . 

A distribution  function  equal  to  O^DCu)  has  been  extensively  studied 
in  reliability  theory  (see  Barlow  and  Doksum  (1972))  under  the  notation 


F”^(u)  - 

Hp  (u)  - / fQ[FQ^F(x)]  dx 


which  we  write  in  our  notation,  letting  t - F(x)  , 


H^^(u)  - dt 


What  is  novel  in  our  approach  la  that  we  consider  the  density  function  and 
Fourier  transform  of  this  distribution  function. 

Recently,  Barlow  and  Campo  (1975)  and  Barlow  and  Proschan  (1977)  have 


studied  the  statistic 


12 


j 

) 

1 

\ 


-1 

V(u)  - / {1  - F(x)}  dx 

0 

vhlch  they  call  the  total  time  on  test  transform  of  the  distribution  F , and 
use  It  to  test  for  exponentlallty.  It  is  the  same  as  our  OqD(u)  with 
FqQq  ■ 1 - u , the  density-quantile  function  of  the  exponential  distribution. 


I 

I 


1 


! 

i 
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4.  Denslty-Quantlle  Autoregressive  Representations  as  Generalizations  of 
Goodness  of  Fit  Hypotheses 

The  concepts  have  now  been  defined  to  state  our  new  approach  to  | 

statistical  data  analysis.  Given  a random  sample  of  a random 

variable  X one  would  like  to  test  the  hypothesis  Hq  that  the  data  is 
normal  (or  exponential  or  any  other  specified  type)  and/or  one  would  like  to  find 
a transformation  of  the  data  after  which  it  is  normal  (or  exponential  or 
any  other  specified  type).  By  a specified  type  we  mean  that  the  true  d.f. 

F is  of  the  location-scale  parameter  form 

where  y and  a are  parameters  to  be  efficiently  estimated,  and  is 

specified.  When  testing  normality,  fQ(x)  = 4>(x)  , the  standard  normal 
distribution  function. 

Theorem:  Hq  Is  equivalent  to  any  one  of  the  following  hypotheses: 

Q(u)  - y + aQQ(u),  q(u)  - aqQ(u),  fQ(u)  - 
d(u)  “ 1 , D(u)  “ u , 4>(v)  “ 0 for  V 0 . 

When  the  density  d(u)  is  constant,  it  is  called  "white  noise"  in 
honor  of  an  analogous  situation  in  time  series  analysis.  An  approach  to 
testing  this  hypothesis  which  also  provides  an  estimator  of  d(u)  when  we 
do  not  believe  it  to  be  a constant  is  to  represent  it  in  a form  called  an  ^ 

autoregressive  representation  (since  it  is  analogous  to  the  spectral 
density  of  an  autoregressive  scheme  in  -time  series  analysis) . 

Definition:  A density  d(u)  is  said  to  be  autoregressive  of  order  m , 

' ■ I 

or  to  have  an  autoregressive  representation  of  order  m , if  it  is  of  the  form 

i 

i 

] 
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d(u)  - K |1  + a„(l)  + ...  + a (m)  (D 

Q ' in  in  ^ 

where  m Is  an  Integer  called  the  order  (whose  determination  is  the  most 
difficult  estimation  problem),  is  a positive  constant  (corresponding 

to  the  finite  memory  m one-step  ahead  mean  square  prediction  error),  and 
ajij(l) , . . . . ,ajjj(m)  are  complex-valued  coefficients  satisfying  the  condition 
that 


g^(z)  - 1 + ajl)  z + ...  + a^(m)  z“ 

has  all  its  roots  outside  the  unit  circle.  (For  future  reference  note  that 
z*  denotes  the  complex  conjugate  of  z ).. 

When  d(u)  ■ — . \ is  autoregressive  of  order  m , one  obtains  a 

representation  for  fQ  which  generalizes  the  formula  which  holds  in  the  loca- 
tion and  scale  parameter  model: 

I 1 2 

fQ(u)  - c 1 + a (1)  e^^^“  + ...  + a (m)  f Q (u)  (2) 

m ' m m u u 


where 

^ |1  + o„(l)  + ....  + o^(m) 

■ 0 

In  fact  we  use  low  order  schemes  to  represent  d(u)  . We  thus  con- 
sider successively  representations  for  fQ(u)  of  the  form 

■ ■ 0 fQ(«)  ■ Cq  fQQQ(u) 

■ - 1 fQ(u)  - Cj  |l  + a^(l) 

■ ■ 2 fQ(u)  - cj  |l  + g2iTlu2|2 


- 15  - 


and  so  on.  It  is  clear  that  we  have  a sequence  of  representations  for  fQ 
which  start  with  the  hypothesis  and  ascend  to  the  general  representation 

fQ(u)  - foQo(u)  c„|l  + a„(l)  + ...  + ajm)  + ...|  (3) 

The  Infinite-order  autoregressive  representation  (3)  holds  when  conditions 
such  as  the  following  are  true  (see  Geronlmus  (I960))  : first, 


fQ(“) 


fQ(u) 


log  fQ(u)  , log  fQQQ(u) 


are  all  Integrable  over  0 ^ u ^ 1 ; second,  fQ  and  ^qQq  satisfy  a 
smoothness  condition  such  as  differentiability.  The  speed  of  convergence 
of  the  approximations  of  order  m to  the  Infinite  order  case  depends  on 
the  number  of  derivatives  that  exist,  and  Is  exponentially  fast  for  Infin- 
itely differentiable  functions. 


Theorem;  The  coefficients  of  an  autoregressive  representation  of  order  m 
for  the  ^qQq  ~ transformation  density  d(u)  can  be  computed  from  a knowledge 
of  the  ^qQq  “ transformation  correlations  0(0)  ,0(1)  ,({>(-1)  , . . . ,())(m)  ,(j)(-m) 
up  to  lag  m using  the  difference  equation  satisfied  by  0(v)  : 


0(-v)  + Olj^(l)  0(1  - v)  + . . . + otjjj(m)  0(m  - v)  ■ 0 , v > 0 

0(0)  + a (1)  0(1)  + ...  + a (m)  0(m)  - K . 
m mm 


Proof : 


Since  d(u)  - Vg„(e^’^^“)(g„(e^''^'‘))*r^ 


we  can  write 
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<|>(-v)  + a (1)  i(>Cl  - v)  + ...  + a (m)  (t(in  - v) 
m D 


-2Trluv 


27riu 


) d(u)  du 


0 

1 


-2'iriuv 


/ e K 


-1 


du 


Now 


is  a polynomial  in  e whose  reciprocal  has  a con- 


vergent power  series  in  positive  powers  of  e (with  constant  term  eq^al 


to  1 ) by  virtue  of  the  assumption  on  the  location  of  the  zeroes  of  g(z)  . 

Since  / 

0 

equals  0 for  v > 0 , and  equals  for  v « 0 . 


du  0 for  positive  v and  k , the  above  expression 


5.  Sample  Quantile  Function 


Given  a sample  of  a continuous  random  variable  X , we 

denote  the  empirical  distribution  function  (EDF)  by  F(x)  , read  F wiggle; 
It  is  defined  by 


F(x)  * fraction  of  Xj^,...,X^  i x . 

We  shall  give  several  definitions  of  the  empirical  quantile  function 
(EQF)  denoted  Q(u)  . The  first  definition  is 

Q(u)  = F^^Cu)  - inf  {x  : F(x)  i u) 

It  is  a piecewise  constant  function  whose  values  are  the  order  statistics 
X^j  < X^2)  < •..  < precisely, 

Q(u)  - X^jj  for  ^ "■  - u ^ n ’ J “ !»•••»"  • 

For  u ■ 0 we  define  Q(0)  = where  X^q^  is  taken  to  be  either  the 

sample  minimum  ^ natural  minimum  when  one  is  available  (when  X 

is  non-negative,  one  might  take  * 0). 

If  one  desires  to  form  a smooth  function  from  a wlggly  function,  it 
seems  reasonable  to  start  with  the  smoothest  reasonable  definition  (which  is 
differentiable  if  possible).  Consequently  a preferable  definition  of  Q(u) 
might  be  the  piecewise  linear  function 
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Q(u)  - n(i  - u)  + n(u  - 1^)  X, 


for  s u s 


— s u s ^ and  j - l,...,n 


Then  q(u)  ■ Q'(u)  is  given  by 


q(u)  - n(X(jj  - 


for  ^ n ^ * 1,  • . . ,n  . 


We  call  n(X^jj  - » J “ 1 n the  spaclngs  of  the  sample 


(see  Pyke  (1965),  (19  72)). 


The  most  Important  fact  about  q(u)  is 


that  it  is  asymptotically  exponentially  distributed  with  mean  q(u)  . The 
saiiq>le  spectral  density  of  a stationary  time  series  has  an  analogous  property. 
Consequently  there  is  an  isomorphism  between  spacings  and  sample  spectral 
densities;  to  any  result  about  one  there  is  an  analogous  result  about  the 
other.  The  methods  of  proofs  and  exact  hypotheses  may  need  to  be  different 
for  the  two  .cases,  but  the  statement  of  the  conclusion  is  usually  found  to 
be  the  same. 

Estimators  which  may  have  better  behavior  in  small  samples  from  symmetric 
densities  can  be  obtained  by  adopting  a shifted  piecewise  linear  function  as 
the  definition  of  Q(u)  : 


r 
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(j+1) 


for  ^ u ^ ^ j-l,...,n-l; 


undefined  for  u < or  u > 1 - 

/n  2n 


Its  derivative  Is 


1 


J(»)  - - X(j))  . < „ < aJ^  . 

Finally  a Bayesian  definition  of  Q(u)  can  be  adopted,  using  the 
Fractional  Order  Statistics  Process  defined  by  Stlgler  (1977). 

For  plotting  of  sample  quantile  functions,  we  have  found  It  useful  to 
normalize  them: 

_ Q(u)  - Q(0) 

Q(l)  - Q(0) 

This  Is  a monotone  function  on  0 ^ u ^ 1 whose  values  lie  between  0 and 
1 . In  my  view,  normalized  graphs  enable  one  to  apply  the  experience  obtained 
In  analyzing  data  of  one  kind  to  the  analysis  of  data  of  another  kind. 

The  asymptotic  distribution  of  the  quantile  process  Q(u)  , 0 ^ u ^ 1 , 

is  usually  studied  In  the  literature  for  the  first  definition;  the  work  most 
useful  to  us  Is  that  of  Czorgo  and  Revesz  (1975),  (1978)  described  in 
Sections  9 and  10  of  this  paper  (see  also  Shorack  (1972)).  An  open  research 
problem  Is  to  shew  that  this  asymptotic 


distribution  theory  applies  also  to  the  other  definitions  of  Q we  have 
given. 


The  basic  estimators  we  form  in  practice  are  linear  functionals  | 

1 

T ■ / W(u)  dQ(u)  . For  the  first  definition  of  Q , 

0 

For  the  second  definition  of  Q , 

n j/n 

T - Z - X..  ,x)  / W(u)  du 

j.l  (j-l)/n 

We  might  evaluate  the  integral  by  a simple  Simpson's  rule  approximation: 


(j-l)/n 


W(u)  du 


For  the  third  definition  of  Q , 


T 


n-1 

I 

J-1 


n(X(j^l) 


(2j+l)/2n 

X,.v)  / W(u)  du 

(2j-l)/2n 


When  W(u)  - f^Q^Cu)  , we  might  approximate  the  last  integral  by 

, n ^1\  ^2iTlv(j/n)  sin  (irv/n) 

^oHnJ  ® TTv 


The  distribution  theory  of  linear  functions  of  order  statistics  has 
an  extensive  literature  (see  Chernoff,  Ghstwlrth,  and  Johns  (19  67), 

Hoore  (1968)  , Stigler  (19  74)). 


I 
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6.  Goodness  of  Fit  Tests 

Given  a sample  of  a continuous  random  variable  X , one 

forms  the  EQF  Q(u)  and  empirical  quantile-density  function  q(u)  . Then 
for  each  probability  law  type  whose  goodness  of  fit  one  might  want  to  test, 
there  is  a corresponding  standard  ^qQq  function.  For  each  specified  ^qQq 
function  one  would  compute: 

1.  Sample  Transformation-Density  Function  or  Weighted  Spaclngs 


d(u)  » FqQqCu)  q(u) 


a 


0 


1 

/ IqQqCu)  q(u)  du  . 


II.  Sample  Transformation-Distribution  Function  or  Cumulative  Weighted  Spacings 


D(u)  “ / d(t)  dt  , 0 s u ^ 1 . 

0 


III.  Sample  Transformation  Correlations 


?(v)  - / d(u)  du  , V - 0,  ± 1,  ± 2,... 

0 


To  test  the  Goodness  of  Fit  Hypothesis  Hq  one  has  available  test 
statistics  as  follows: 


I. 


Mo8d(u)du 
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II.  D(0.5)  , D(.75)  - D(.25) 


■ 'll  » / "'ll 
0 

P- 

/ (u)  dD(u)  for  a specified  (u) 

0 


III.  sequence  |())(1)  |^,  |({)(2)  1^, . . . 


0^  O 

E k(v)  |((i(v)|  for  a specified  k(v) 
vjtO 


The  distribution  theory  of  many  of  these  statistics  have  already  been 
studied  in  the  literature.  For  a general  D(u)  , the  almost  sure  con- 
vergence  to  0 of  o<uSl  ” D(u)  | was  proved  by  Barlow  and  van  Zwet 

(1970).  The  asymptotic  distribution  of  0»  was  found  by  Weiss  (1964). 

The  asymptotic  distribution  of  log  d(u)  du  is  the  same  as  that  of  the 
sample  innovation  variance,  as  given  by  Davis  and  Jones  (1968)  and  Hannan 
and  Nicholls  (1977). 


Under  Hq  , the  asymptotic  distribution  of  ^ ^ ^ d(u)  is  the  same 
as  the  distribution  in  time  series  analysis  (first  found  by  Fisher  (1929)) 
of  the  maximum  normalized  periodogram  ordinate  of  white  noise. 


An  important  open  research  problem  is  the  following  Conjecture; 
under  Hq  , the  stochastic  process  {D(u)  - u},0^u^l  is 

asymptotically  distributed  as  a Brownian  Bridge  process  B(u)  , 0 ^ u ^ 1 > 
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this  has  been  proved  for  £qQq(u)  ■=  1 - u , corresponding  to  the  exponential 
distribution  (Barlow  (1976)  personal  communication).  It  would  then  follow 
that  all  statistics  based  on  D(u)  - u have  the  same  asymptotic  distribution 
theory  as  the  corresponding  statistics  based  on  F(x)  -x,0^x^l,  where 
F(x)  Is  the  EDF  of  a random  sample  from  a uniform  distribution  on  [0,1] 
whose  theory  Is  summarized  by  Durbin  (1973). 

The  foregoing  framework  Includes  as  special  cases  many  goodness  of  fit 
test  statistics  that  are  being  proposed  (for  example,  Andrews'  test  for 
normality  (Gnandeslkan  (1977),  p.  165)  and  tests  for  Welbull  and  extreme 
value  distributions  Introduced  by  Mann  and  Fertlg  (1975)). 

In  the  next  section  we  propose  additional  Goodness  of  Fit  Tests  based 
on  determining  the  order  of  an  autoregressive  smoother  to  d(u)  . 
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7.  Denslty-Quantlle  Autoregressive  Estimation 

Given  a sample  of  a continuous  random  variable,  we  have 

discussed  how  to  test  a Goodness  of  Fit  Hypothesis  by  forming  the  sample 
functions  d(u)  , D(u)  , and  <l*(v)  . In  this  section  we  discuss  how  to 

A 

form  autoregressive  densities  of  order  m , , m “ 0,1,...  which  are 

candidates  for  estimators  of  the  true  density  d(u)  . The  sequence  has  the 

^ A 

property  that  d^Cu)  is  constant  (identically  equal  to  1 ) and  d^(u) 
tends  to  d'(u}  as  m Increases. 

For  time  series  spectral  estimation  by  autoregressive  estimators 
Parzen  (1974),  (1977)  has  Introduced  a criterion  called  CAT  (criterion  auto- 

A 

regressive  transfer  function)  for  determining  the  optimal  order  m such  that 

A 

d/N(u)  is  an  optimal  estimator  of  d(u)  . We  calculate  an  analogous  criterion 
m 

A 

for  smooth  densities  d (u)  , defined  by 

m 

1 ***  '^—1  '^—1 
CAT  (m)  - i Z k/  - K . 
n j.i  j 

The  distribution  theory  of  CAT  (m)  is  known  approximately  only  under  Hq  . 
Consequently,  at  the  present  time  we  regard  CAT  as  interpretable  only  when  it 

A 

chooses  m > 0 ; then  we  regard  it  as  additional  confirmation  that  Hq  holds 

^ M I 1 2 

(when  this  hypothesis  Is  accepted  by  tests  based  on  d , D , and/or  |4>|  )• 

A graphical  approach  to  choosing  the  appropriate  smooth  estimator 

A 

d (u)  which  is  the  most  likely  estimator  of  d(u)  is  to  use  as  a criterion 

n 

^ U ^ 

how  D (u)  ■ / d (t)  dt  fits  D(u)  . If  it  fits  too  well  one  has  over- 

ID  * Q in 

smoothed,  and  the  density  d^(u)  will  have  spurious  modes.  One  wants 

A ^ 

D (u)  to  follow  D(u)  but  not  slavishly, 
m 

Next  we  define  the  autoregressive  estimators  and  state  a theorem  con- 


cerning their  consistency. 
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The  autoregressive  smoother  of  order  m' , denoted  d (u)  , Is  defined 


to  be 


J N r li  2nu  ^ ^ C / N 2Tnum 

d (u)  ■ K |1  + 01  (1)  e +...+•  a (m)  e » 

■ mm  m 


where  are  the  values  of  a^(l) \('°)  minimizing 


/ |g;^(e^^“)  I d(u)  du 


where  g^(z)  ■ 1 + ci^(l)  * + • • • • + oi^(«)  * ♦ 


K - / U (e^’^“)  I d(u)  du 
0 


where  g^(*)  ■ 1 Oj^d)  z + ...  + oi^(“)  * 


By  the  projection  theorem  In  Hilbert  space,  g^(z)  satisfies  the 
orthogonality  conditions 


t , ZTTXU. 
/»  «.(•  ) 


2it1u.  -2irluv  „ 

) e d(u)  du  ■ 0 


for  V ■ l,...,m  which  Is  equivalent  to  the  normal  equations 


^(•-v)+  a (1)  ^(1  - v)  + ...  + a (m)  <Km  - v)  - 0 


for  v - l,...,m  . Next,  the  orthogonality  conditions  Imply 


.1 
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K 

■ 


/ d(u)  du 


■ 1 + cx^(l)  ♦(!)  + ....  + Q.  (m)  (|»(in)  . 

m m 

A 

A rigorous  theorem  concerning  the  consistency  In  probability  of 
as  an  estimator  of  d(u)  can  be  proved  by  adapting  the  work  of  Carmichael 
(1976)  in  his  Ph.D.  thesis  on  the  autoregressive  method  for  probability 
density  estimation. 

Theorem  (Carmichael  (1976)).  If 

(1)  d(u)  , d"^(u)  , log  d(u)  are  Integrable 

(2)  d(u)  Is  bounded  above  and  below  In  the  s«.''se 

0 < dj^  S d(u)  ^ d^  < * a.e.  In  [0,1] 

(3)  d(u)  - c(u)  a.e.  [0,1]  and  c satisfies,  for  some  a > 0.5  , 

sup  / |c(u  + h)  - c (u)  1^  du  ■ 0(6^°*) 

|h|s6  0 

(4)  m Is  chosen  as  a function  of  the  sample  size  n satisfying 

11m  — - 0 

then  as  n -►  » 

A 

sup  |d  (u)  “ d (u) I 0 In  probability 
OSuSl  “ ® 

vhere  d^(u)  Is  a density  function  with  an  Infinite  autoregressive 


rarrasantatlon 
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d^Cu)  ■ K»l8o»Ce^^^“)  I 

and  satisfying  * d(u)  a.e.  in  [0,1] 

The  pxv''"  of  this  beautiful  theorem  is  being  submitted  for  publication. 

^ A 

An  estimator  d^^Cu)  of  d(u)  yields  an  estimator  fQ^(u)  of  fQ 
which  is  given  explicitly  by 

^ |l  + o (1)  + ...  + oij,(n)  ^qQqCu) 

fQ^(u)  . 

|l  + oij^(l)  e^^^“  + ...  + f^Q^Cu)  q(u)  du 

A A 

where  a (l),...,a  (m)  are  the  solutions  of  the  normal  equations  (1). 
m in 

A 

To  compare  our  autoregressive  estimator  fQ^(u)  with  other  possible 
estimators  one  must  realize  that  we  are  actually  estimating  the  triple  of 


functions  fQ  , q , 

and  Q , and  the  basic  aim  is  to  form  a smooth  function 

A 

Q which  is  an  estimator  of  Q . One  can  distinguish  three  general  approaches 

to  forming  estimators 

A 

Q which  we  call 

1. 

Parametric 

II. 

Non-parametrlc 

III. 

Non-parametric  pre-flattened. 

i 


The  parametric  approach  assumes  a location  and  scale  parameter  represen- 

A A 

tatlon  Q(u)  y + OQ^Cu), forms  efficiently  estimators  y and  o , and  then 


A A A 

Q(u)  * y + oQq(u)  as  the  estimator  of  Q . 


takes 
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t 

j 

i 

I! 

i 

l! 

1, 

'( 

;* 

1 

[■ 

i 

!; 

I 

:) 


I 


I 

i 

i 


The  non-parametrlc  approach  estimates  Q at  a point  by  averaging  over 
the  values  of  Q(p)  for  p in  a neighborhood  of  u . An  estimator  of  this 
form  Is  usually  written  as  a kernel  estimator 

1 

q(u)  - Q(P)  i dp 

for  a suitable  kernel  R and  bandwidth  h . If  one  adopts  the  plecewlse- 
linear  definition  of  Q , one  can  differentiate  this  formula  for  Q to  form 

A 

a smooth  estimator  q of  the  quantlle-denslty  q : 


Estimators  of  this  form  are  In  fact  extensively  studied  In  the  literature  of 
non-parametrlc  density  estimation  (see  Boflnger  (1975),  Moore  and  Tackcl  (1977)). 
under  the  name  of  "nearest  neighbor  density  estimates."  Another  approach  to 
fitting  smooth  curves  q to  the  wlggly  function  q is  to  use  splines  (see 
Vahba  and  Wold  (1975)). 

The  foregoing  estimators  of  q will  have  good  properties  only  at  a 
fixed  value  of  u ; the  consistency  of  estimation  becomes  worse  as  u tends 
to  0 or  1 because  q(u)  Is  In  general  a non-lntegrable  functlo*'.  This 
problem  can  be  overcome  by  multiplying  q(u)  by  a factor  fQQQ(u)  which 
makes  the  product  fQQQ(u)  q(u)  an  Integrable  function,  %rtilch  Is  not  oscilla- 
ting as  much.  When  one  smooths  not  q(u)  but  fQQQ(u)  q(u)  , we  call  the 
approach  non-parametrlc  preflattened  smoothing.  We  smooth 
d(u)  ■ fQQQ(u)  q(u)  ^ Oq  . One  approach  would  be  to  form  estimators  of  the 


form 
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It  Is  difficult  to  use  this  approach  In  practice  because  of  difficulties  In 
optimally  choosing  h . We  believe  the  autoregressive  approach  to  density 
estimation  goes  a long  way  towards  overcoming  these  difficulties. 

For  the  mathematical  statistician,  many  problems  are  open  for  research 
concerning  the  asymptotic  distributions  of  the  foregoing  estimators. 
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8.  Computing  Routines  and  Examples 

A computer  program  which  Implements  the  data  analysis  approach  described 
here  has  been  developed  by  Prof.  J.  P.  Carmichael  and  Mr.  David  Trltchler. 
Given  a sample  It:  (1)  lists  their  order  statistics,  means, 

variances,  etc.;  (2)  plots  the  normalized  quantile  function;  (3)  plots 
apaclngs.  The  ^qQq  functions  of  various  familiar  probability  laws  are 
available  to  be  applied.  For  a specified  ^qQq  function,  the  computer  pro- 

A# 

grams  (4)  plots  d(u)  the  raw  transformation-density  function;  (5)  plots 
D(u)  , the  raw  transformation-distribution  function;  (6)  plots  |4)(v)|  , 

the  square-modulus  raw  transformation-^correlatlons.  Next  for  m > 1,2,..., 

A 

the  autoregressive  approximator  d (u)  Is  computed,  and  Its  distribution 

ID 

function  D^(u)  is  plotted  superimposed  on  a graph  of  D(u)  to  enable  one 
to  see  how  well  D^(u)  fits  D(u)  . Finally,  CAT,  a criterion  to  help 
determine  the  optimal  order  m of  autoregressive  approximation,  is  tabulated, 
and  the  order  at  which  CAT  achieves  Its  minimum  is  determined.  In  addition, 

A,  A 

for  each  m the  denslty-quantlle  estimator  fQ^Cu)  corresponding  to 
Is  plotted.  In  the  absence  of  a rigorous  procedure  for  determining  the 

A A 

optimal  order  m , we  choose  those  values  of  m for  which  D (u)  "fits" 

fli 

D(u)  . 

Rayleigh  example.  Tukey  (1977),  p.  49,  gives  an  example  of  data 
(Rayleigh's  weights  of  a standard  volume  of  "nitrogen"  consisting  of  15 
measurements)  which  can  be  used  to  look  hard  at  the  advantages  and  disad- 
vantages of  graphical  data  analysis  techniques.  Rayleigh's  observations  In 
1893-1894  established  a discrepancy  between  the  densities  of  nitrogen  pro- 
duced by  removing  the  oxygen  from  air  and  nitrogen  produced  by  decomposition 
of  a chemical  compound  which  led  him  to  Investigate  the  composition  of  air 
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chealcally  freed  of  oxygen  which  led  to  the  discovery  of  argon,  for  trtilch 
Rayleigh  (1842-1919)  was  awarded  the  1904  Nobel  Prize  In  Physics. 

We  may  define  the  goal  of  statistical  data  analysis  techniques  as 
follows:  on  the  one  hand,  to  enable  the  scientist  to  win  a Nobel  Prize;  on 
the  other  hand,  to  protect  the  statistician  from  being  sued  by  a scientist 
who  claims  that  using  the  statistician's  techniques  prevented  him  (her)  from 
winning  a Nobel  Prize. 

Tukey  discusses  how  to  present  the  data  so  as  to  make  It  quite  clear 
that  it  separates  Into  two  quite  Isolated  subgroups,  which  one  Interprets  as 
Indicating  that  the  single  batch  of  weights  might  be  two  batches  of  weights 
(as  in  fact  they  are,  one  for  "nitrogen"  from  air,  the  other  for  "nitrogen" 
from  other  sources) . 

The  presence  of  two  batches  will  be  Indicated  by  the  shapes  of  the 
empirical  quantile  function  or  spaclngs.  However,  I believe  It  Is  most 
clearly  Indicated  by  the  presense  of  two  modes  In  the  estimated  density- 
quantile  function.  We  usually  estimate  fQ  taking  as  the  base  function 
fQQQ  the  standard  normal  density,  so  that  the  procedure  also  provides  a 
test  of  normality.  The  Rayleigh  data  Is  clearly  non-normal.  We  take  order 
m ■■  2 as  an  optimal  autoregressive  approximation  (on  the  criterion  of 
the  fit  of  D2(u)  to  D(u)  ) and  obtain  the  estimated  denslty-quantlle 
function  whose  plot  appears  in  Figure  I;  It  Is  blmodal. 

The  reader  may  find  it  Interesting  to  compare  the  denslty-quantlle 
function  plot  In  Figure  I with  Tukey 's  two  batches  box  and  whiskers  plot 
in  Tukey  (1977),  p.  51.  Our  left  hand  mode  (representing  "other  than  air" 
nitrogen  measurements)  is  lower  than  the  right  hand  mode  (representing 
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"from  air"  nitrogen  measurements) , Indicating  that  the  left  mode  population 
la  more  variable  than  the  right  mode  population. 

Buffalo  snowfall  example.  The  63  yearly  values  of  snow  precipitation 
In  Buffalo  (recorded  to  the  nearest  tenth  of  an  Inch)  from  1910- 
1972  have  been  extensively  analyzed  by  Carmichael  (1976)  and  Thaler  (1972) 
to  Illustrate  and  compare  various  probability  density  estimation  techniques. 
Different  analyses  have  Indicated  either  a unl-modal  or  tri-modal  density, 
with  the  trl-modal  shape  usually  regarded  as  the  more  likely  answer.  In 
our  density-quantile  estimation  procedure,  with  base  ^qQq  taken  to  be  the 

A 

standard  normal,  the  order  0 and  order  1 autoregressive  estimator  fQ^(u) 

A 

are  unlmodal,  and  the  order  2 autoregressive  estimator  10^(0)  Is  trl-modal 
(see  Figure  II).  However  all  our  D and  based  diagnostic  tests  of 

the  hypothesis  that  Buffalo  snowfall  is  normal  confirm  that  it  Is. 

Thus  the  trlmodal  density  estimator  often  obtained  In  previous  analyses 
seems  not  to  be  correct.  It  Is  Interesting  that  Tukey  (1977),  p.  117  also 
suggests  Buffalo  snowfall  as  an  example  for  analyses  (and  gives  the  data 
for  1918-1937). 


Figure  I.  Rayleigh  data.  Crosses  represent 
cumulative  weighted  spacings  function  D . 
Solid  line  represents  autoregressive  estimator 

A 

D,  of  order  2 . 


Figure  II. 


Rayleigh  data.  Autoregressive 

A 

estimator  fQ  of  density  quantile  ftmction. 

A 

Order  2 chosen  on  basis  of  fit  of  D to  D . 


Figure  HE.  Buffalo  Snowfall  Data.  The  sample 
quantile  function  Q is  in  upper  left  graph, 
spacings  or  sample  quantile -density  function  q 
is  in  lower  left  graph,  normal  weighted  spacings 

a I ^ 

d = (p$  q is  in  lower  right  graph,  and  cumulative 
weighted  spacings  D is  in  upper  right  graph. 
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Figure  IV.  Buffalo  Snowfall  Data.  Upper  left  graph 

A 

A# 

depicts  D by  crosses  and  by  solid  line; 

A 

AS 

upper  right  graph  depicts  D by  crosses  and 

A 

by  solid  line.  Autoregressive  estimators  fQ^ 

A 

and  fQ^  of  orders  1 and  2 appear  in  lower  left  and 
lower  right  graphs  respectively. 
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9.  Density-Quantile  Classification  of  Probability  Laws 

An  examination  of  the  denslty-quantlle  functions  fQ(u)  of  familiar 
probability  laws  Indicate  that  they  can  be  classified  according  to  their 
limiting  behavior  as  u tends  to  0 or  1 . The  behavior  as  u '*>  1 can 
be  described  as  either 

fQ(u)  ~ (1  - u)®  , o > 0 


or 

fQ(u)  ~ (1  - u)  ^log  , 0 S $ s 1 

positive  finite 

where  gj^(u)  ~ g2(u)  means  gj^(u)  t g2(u)  tends  to  a/constant  (as  u -»■  1). 

We  call  a the  tall-exponent  parameter  and  3 the  shape  parameter  of 
a distribution.  A rigorous  definition  of  the  tall  exponent  Is  given  at  the 
end  of  the  section. 

The  parameter  ranges  a < 1 , a - 1 , and  a > 1 correspond  to  the 
statistician's  perception  that  probability  laws  have  three  types  of  tall 
behavior : 

I.  SHORT  TAILS  OR  LIMITED  TYPE 

II.  MEDIUM  TAILS  OR  EXPONENTIAL  TYPE 

III.  LONG  TAILS  OR  CAUCHY  TYPE 

The  names  limited  type,  exponential  type,  or  Cauchy  type  are  used  In 
the  theory  of  extreme  value  distributions  to  describe  the  types  of  distri- 
butions leading  to  the  three  types  of  extreme  value  distributions  (see  Gumbel 
(1962)). 


j 


‘I 
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The  uniform  distribution  has  ot  > 0 : 


f (x)  - 1 , 0 s X s 1 ; fQ(u)  - 1 , 0 s u s 1 


An  example  of  a short-tailed  distribution  is 


f (x)  - c(l  - x)®"^  . 0 S X S 1 ; fQ(u)  - I (1  - u)^"® 


where  c > 0 and  3 ~ 1/c  . 

Examples  of  exponential  distributions  are 


exponential 

logistic 

Weibull 
c -•|>0 


e”* , X > 0 ; fQ(u)  ■ 1-u 


X 2 ’ 

(1  + e*)^ 


< X < * ; fQ(u)  “ u(l~u) 


cx®“^e"*  , X > 0 ; fQ(u)  - i (l-u)|log 


extreme  value 

X -e 
e e 

, -<»  < X < * ; 

fQ(u) 

- (1-u)  log  Y 

Normal 

♦ (*)  ■ 

_1_  e-^ 

fQ(u) 

1 , 

- - exp  - 

Hx)  - 

/ <Ky)  dy 

~ (1-u)  (2  log 

u 

-1. 


It  should  be  noted  that  in  the  parametrlzation  of  exponential 

type  distributions  (those  for  which  a • 1 ) the  values  $ “ 0 , *5  , 
mnA  1 correspond  to  the  extreme-value,  normal  and  exponential  distribu- 
tions respectively*  It  should  also  be  noted  that  the  6 parametrlzation 
does  not  cover  all  exponential  tjrpe  distributions;  in  particular 
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It  does  not  cover 


Lognormal  f(x)  - ^ ♦(log  x)  ; fQ(u)  - 


Examples  of  long  tall  distributions  are 


»-!(») 


Cauchy 


^ — ^-"2  , -«  < X < ® ; fQ(u)  - i cos^  Tt(u  - %) 


1 + x‘ 


12  2 
- sin  iru  ~ (1  - u) 


»«(«)  ■ a-u, 

X 

Pareto  0 > 0 . ^ , x > 1 ; 


fQ(u)  - ^ (1  - 


Tukey  X < 1 . Q(u)  - y i ;T  • » - 1-X 

' ' 1 + u"“(l  - u)“ 


The  double- exponential  distribution  exemplifies  another  aspect  of 
distributions  which  can  be  used  to  classify  them  — their  differentiability. 


1 


Double- 

exponential 


4 e fQ(u)  ■ u for  u < 0.5 


■ 1 - u for  u > 0.5 


The  non-dlfferentlablllty  (at  x - 0 ) of  the  double  exponential  density 
makes  the  density-quant lie  function  non-dlfferentlable  at  u ~ 0.5  . Non- 
dlfferentlablllty  of  the  density  Is  equivalent  to  the  characteristic  function 


♦(u)  - / e^*"  f(x)  dx 


I 


2 

decaying  as  1/u  as  u . Thus  one  can  classify  distributions  according 
to  the  decay  rate  of  (1)  their  densities  and  (2)  their  characteristic  func- 
tions. The  approach  to  statistical  data  analysis  discussed  In  this  paper 
basically  assumes  chat  the  densities  we  are  considering  are  differentiable 
In  order  to  obtain  reasonable  rates  of  consistency  for  our  estimators. 

Given  data,  the  parameters  we  desire  to  estimate  for  It  are:  location 
y , scale  a , call-exponent  a , and  (when  ot  > 1 ) shape  3 . 

To  efficiently  estimate  location  and  scale,  one  must  know  ^qQq(u)  or 
at  least  Its  tail  exponent  a . A formula  given  by  Andrews  (1973)  for  the 
tall  area  of  a distribution  suggests  a fundamental  formula  for  the  limiting 
behavior  of  fQ  functions  as  u 1 , and  also  suggests  a formula  idilch 
might  be  used  to  rigorously  define  the  tail  exponent  a of  a distribution. 

Andrews'  call  area  approximation  formula  may  be  written 

*-'(•)  ■ ig}  - ")] 

defining  g(x)  ■ f'(x)/f(x)  - {log  f(x)}  and 

ic  - 11m 

• g^(x) 

In  this  formula,  let  u - F(x)  . Then  gQ(u)  ■ -J(u)  , g'Q(u)  - (fQ)"(u)  fQ(u} 
and 


1 - u 


a 
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I 


[ 

i 


defining 

a - -i-  K - fQ(u)(fQ)"(u) 

1 - K ’ u -»■  1 t2/  \ 

J (u) 

The  ranges  a<l,  a«l,  a>l  correspond  to  <<0,  ic«0, 

and  K > 0 respectively. 

We  are  thus  led  to  a rigorous  definition  of  the  tail  exponent  a s 

, li®  (1-u)  J(u) 

“ u-^1  fQ(u) 

This  value  of  a satisfies  approximately  for  u near  1 

- (i»«  'O'"))'  - 1^  ■ 

whence  log  fQ(u)  ••  a log  (1  - u)  -t-  constant,  and 

fQ(u)  ~ (1  - u)“  , 

which  is  our  Intuitive  definition  of  a . 

One  can  state  a general  assumption  describing  the  densities  for  which 
the  foregoing  relations  hold.  We  consider  densities  f(x)  which  may  have 
several  modes  (called  multi-modal)  but  they  do  not  have  an  infinite  number 
of  modes.  We  call  such  densities  flnitely-modal,  defined  as  follows. 

A density  f is  called  finitely-modal  if : (i)  it  is  non-decreasing 

on  an  Interval  to  the  right  of  a ■ sup{x  : F(x)  ■ 0)  , and  it  is  non-increasing 
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on  an  Interval  to  the  left  of  b - lnf{x  ; F(x)  -1}  , where 
^ a < b ^ , and  (11)  there  Is  a y > 0 such  that 


sup  F(x)fl  - F(x))  S y 

a<x<b  ' ' f‘^(x) 


or  equivalently 


® ^ I' 


Flnltely-modal  densities  are  considered  (without  being  so  named)  by 
Caorgo  and  Revesz  (1978)  who  demonstrate  that  they  enjoy  strong  approximations 
of  the  quantile  process;  In  Section  10  we  apply  this  fact  to  estimation 
of  location  and  scale  parameters. 

An  example  of  a distribution  function  which  is  not  finitely- modal  is 


1 - F(x)  = exp  (-  X --  Sin  x) 


Letting  x = Q(u)  one  obtains  a relation  for  Q(u); 


-log  (1  - u)  = Q(u)  + 7 Sin  Q(u) 


whence 


= q(u)  (1  +|co8  Q(u)3 


fQ(u)  = (1  - u)  Cl  +7  Cos  Q(u)3 


As  u-*l,  Q(u)-»«,  and  fQ(u)  oscillates.  The  hazard  quantile  function 


hQ(u) 


= iOlHl  = 1 + i Cos  Q(u) 


1 - u 


also  oscillates. 


r 


•mm 
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10.  Estimation  of  Location  and  Scale  Parameters 

The  problem  of  estimation  of  location  and  scale  parameters  y and  9 
usually  arises  when  one  assumes  that  the  true  distribution  function  F of 
X may  be  represented 

'<*)  ■ fo(^) 

lAere  F^  Is  a known  distribution  function;  we  call  this  representation 
hypothesis  . An  equivalent  representation  may  be  given  for  quantile 
functions:  I 


Q(u)  ■ y + oQqCu)  . 


When  Fq  Is  not  known  it  may  be  "estimated"  from  the  data  using  a 
Goodness  of  Fit  Test  for  the  Hypothesis  . We  have  Indicated  how  to  find 
such  goodness  of  fit  tests  as  a special  case  of  the  problem  of  finding  a 
function  , such  that  X ~ where  Y has  a specified  distribution 

Fq  . However  our  approach  finds  only  the  derivative  “ 'l'j^(y)  and 

thus  yields  only  a representation 


\(y)  - W + oYgCy) 


where  Is  an  Indefinite  Integral  of  i|;^  . Then  the  quantile  function  Q 


Q(u)  - y + o'FqQqCu) 


of  X has  the  representation 
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The  parameters  y and  a In  this  representation  would  be  estimated  In  the 
same  way  one  estimates  any  other  pair  of  location  and  scale  parameters. 

Much  work  In  the  last  twenty  years  has  gone  Into  showing  how  to  obtain 
computationally  simple  asymptotically  efficient  estimators  of  location  and 
scale  parameters  y and  a using  linear  combinations  of  order  statistics. 

I believe  the  basic  conclusions  of  this  vast  effort  can  be  compactly  (and 
even  rigorously)  summarized  by  applying  the  theory  of  regression 
analysis  on  continuous  parameter  time  series  from  the  RKHS  (reproducing 
kernel  Hilbert  space)  point  of  view  given  by  Parzen  (1961),  (1967) 

A rigorous  starting  point  are  the  Important  theorems  by  Csorgo  and 
Revesz  (1978)  on  strong  approximation  of  the  quantile  process. 

Theorem.  Let  be  1.1. d.  random  variables  with  continuous 

d.f.  F and  differentiable  density  f which  Is  flnitely-modal  and  has  tall 
exponent  a (as  defined  at  the  end  of  Section  9).  The  quantile  process 
Q(u)  Is  defined  in  terms  of  the  order  statistics  ^^j.)  ^ ***  ^ ^(n)  * 
let  Qy(u)  he  the  quantile  process  of  the  uniformly  distributed  random 
variables  > F(Xj)  . Let 

R ■ sup  /n"  |fQ(u){Q(u)  - Q(u)}  - {Q,,(u)  - u}) 

“ 0<u<l  ^ 

Then  almost  surely 


0(n  ^ log  log  n) 

if 

o < 1 

0(n“**(log  log  n)^) 

if 

0-1 

o(n‘’*(log  log  n)®(log 

if 

a > 1 
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vhcre  e > 0 is  arbitrary. 

To  state  a theorem  concerning  the  behavior  of  the  uniform  quantile 
process  , recall  the  definition  of  a Brownian  Bridge  {B(u)  , O^u^l}; 
It  Is  a zero  mean  normal  process  with  covariance  kernel 

*B^“l*“2^  “ (Uj^,U2)  - Uj^u^  . 

Theorem.  Csorgo  and  Revesz  (1975) • One  can  define  a Brownian  Bridge 

{B  (u)  , 0 ^ u ^ 1}  for  each  n such  that  almost  surely 

o 

sup  l/n  {QyCu)  - u}  - B (u)  1 ■ 0(n~^  log  n)  . 

O^u^ 

For  purposes  of  statistical  Inference,  we  can  Interpret  the  foregoing 
results  as  follows:  /iT  fQ(u)  {Q(u)  - Q(u)}  Is  distributed  as  a Brownian 
Bridge  B(u)  . Under  the  representation  Q(u)  ~ y + oQq(u)  we  obtain 

O {Q(u)  - U - oQq(u)}  ~ B(u)  . 

Estimating  y and  (7  becomes  a problem  In  regression  analysis  of 
continuous  parameter  time  series  by  writing 

Q(u)  - UfoQoC'*)  + +OgB(u) 


where 
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W«  will  consider  estimators  for  jj  and  0 , treating  Q-  as  a free  parameter 

B 

not  constrained  to  be  related  to  a . We  will  find  that  estimators  of  o„ 

B 

can  be  used  to  test  the  goodness  of  fit  of  the  model. 

The  remainder  of  this  section  is  devoted  to  writing  explicit  formulas 
for  asymptotically  efficient  and  unbiased  estimators  y and  o , which  are 
linear  combinations  of  order  statistics.  These  formulas  assume  ^qQq  Qq 

ere  known;  an  open  problem  for  research  is  the  use  of  these  formulas  with  smooth 

A 

estimators  ^qQq  and  Qq  to  provide  adaptive  estimators  of  y and  a . 

Estimating  y and  o given  a possibly  censored  set  of  order 
statistics  *(np) ' * * * '*(nq)  more  conveniently  formulated  as 

using  the  sample  quantile  function  Q(u)  over  a subinterval 
p£u<_q  of  0£u£l  (however  v/e  permit  p = 0 or  q = 1 
as  possible  cases).  To  form  the  estimators  C „ 2Uid  a based 
on  this  data  we  need  compute  the  reproducing  kernel  inner  product 

„ of  functions  on  the  interval  P £ q corresponding 

P»9 

to  the  kernel  K-(Ui,Uo)«  We  claim  that  this  RKHS  consists  of 

a X ^ 

1^2  differentiable  functions  with  inner  product 


<f»g>p^q  = f (u)  g' (u)  du 


+ ^ f (p)  g(p)  + f(q)  g(q) 


where 


<Vo'  Qo^^oV> 


I,0<P'«>  - ^Oo'^oOo'-  °0<WVq  • 


Define 


n,v,p,g 


<foQo'  «<^o0o'> 


n,o,p,q 


<Qo(foQo>»  3(fQQo)> 


Then  the  optimal  estimators  are  given  by 


l’^(Prq) 


n,y,p,q 


n»<y  »p»q 


with  variance  and  covariance  matrix 


VrCjp,,) 


Cov(5p,,.  Sp^q) 


t . 


>.  Op  i'^(p.q) 


®p.q’  ''“'®p-q’ 


These  estimators  maybe  Justified  also  by  their  similarity  to  those  given  by 

■ q . 

Weiss  and  Wolfovltz  (1970). 
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Finally  to  estimate  we  would  use  an  estimator  denoted  by 
which  Is  formed  from  the  residuals  Q(u)  - Q(u)  ; define 


B.P.q 


V,<“>  ■ 

- 1 ^2 
B.P.q  n(q-p)  ^B,p,q 

Xf  we  are  willing  to  accept  the  model,  we  could  take  as  our  estima- 
tor 

^2  ^ 1 g2 

®B.p,q  n p.q 


,212 

since  Oq  **  r • 

0 n 

In  order  to  explicitly  evaluate  the  inner  product  (f #g>p^q  * 
it  is  often  convenient  to  use  no  derivatives  of  g if  one  is 
willing  to  use  second  derivatives  of  f.  Since  f'g*  + f"g  ■ (f*g) 
we  cam  write 


f’g*  du  » 

- I f "g  du  + f g| 

•'P 

so  that 
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<f»g>p^q  “ - f " (u)  g (u)  du 

+ g(p) f (p)  - f ' (p)l 
+ 9(q)[x^  f(q)  + f'(q)l  • 

Thus 

<*oOo' ■ r -<*oOo<“>’''o°o'"> 

+ fi(p)foQo(P)[|*oOo'P>  - <W'P>' 

+ S(q)  fjQo  (q)  foQo  <9'  + <*0«0> 

(Oo(*o°0>'  °<'oV’  = r -{Q,(u)  £oQo<«)>'*o°0<“> 

+ Q(P)  tjQo  (P)  Qp  (P)  toOfl  'P>  - 'Qo  “0°0>  ’ ’ 'P” 

+ 6 (q)  fpQo  (q)  [ Q„  (q)  £oQo  <q)  + {Qo  (foQo)  ) ■ (q)  ) • 

/ 

To  comprehend  the  linear  functionals  in  Q which  appear  in  our 
formulas  for  y and  o,  define  the  weight  functions 
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* * 

w^(u)  - “ j;<«) 

.-  J^Cu)  + Qo(u)  J'(u)  foOgCu) 

- Jq(u)  + Qq(u)  V»j^(u)  . 

Define  the  additional  weights  factors 

V‘P’  " Vo<P>‘p  *oOo<P>  ■^0<P>> 

V'-J’  - Vo"*’  - 

"oL<P>  - *oOo‘P>'f  °0<P>  'o°0<P>*°o'P>''o'P’‘^' 

- 0,(p)W^j_(p)  - fjQ^Cp) 

"or‘^>  =■  *oOo  <«>  Oq  ‘"J’  *o°o  f-J*  + 1 - Ofl  '-j'  Jq  ''J>  > 

- Oo(q)W^g(q)  + fjQg(q)  . 


The  linear  functionals  of  Q which  appear  in  y and  a may  be 
written 


n«y/P/g 


J'p  ® »p»g 


fP 

J Wy(u)  Q(u)  du  + 0(p)  Wyj^(p)  +Q(q)  W^j^(q) 
W^(u)  Q(u)  du  + Q(p)  W^j^(p)  +Q(q)  W^j^(q). 


i 
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These  integrals  are  really  linear  combinations  of  order  statistics 
if  we  take  Q(u)  to  be  a piecewise  constant  function  equal  to 
Xjjj  for  (j-l)/n  < u £ j/n  . 


The  entries  of  the  information  matrix  may  be  written: 


I^,„<P.q)  - I’  |Jo<“)  1^  + 5 l‘oOo'P'  1**1^  I 


Ip  "w'”’  * V*'’’  * V*'*’  • 


“ J 

+ |Qo<P>lfoQo<P>l^ 


- W^(u)  Qq(u)  du  + Qq(p)  w^l(p)  +Qot<3)W^Rt<l) 

- I w^(u)  du  + W^J.  (p)  + W^j^{q)  . ^ 
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- Ip  IVo'"> ■ 

♦ i |Qo(p)  fo«o<P>l  ^ ^ 

~ * 

+ 1^  lQo«J>  ro°o<9>l* 

■ Ip  "o<“’  °0'“>  * 00<P>  ®<IL<P'  • 

Zn  the  case  of  a symmetric  density  £q(x)  » £q(-x),  we  have 

Jjj(l-u)  = 

Q^d  -u)  * * 

For  the  case  of  censorship  which  is  symmetric  in  the  sense  that 
q-l-p,  Ijj^(P»q)  = 0 and 

•» 

S . ^n,|i,p.g 

•^ppq  ij,p(p»q) 

T 

S ' . . \ 

p»q  ij,j,(p»q) 

Symmetrically  censored  normal  samples  is  an  important  case 
which  we  discuss  in  detail.  For  the  normal  distribution 
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Vo'">  ■ ^exp-i|»-l(u)|*, 

W (u)  « 1 

V 

lf(u)  - 2*"^(u)  ■*  - ' 

* ’ * • . ^ • 

V“*’  ' 

w,i,<q)  - *o°o<5>  ■ ♦■^(q))l. 

In  order  to  study  the  behavior  of  these  weights  as  q -*■  1,  we  note 
an  important  property  of  the  normal  distribution  [which  follows  from 
Feller#  Vol.  1#  p.  166#  eq.  (1.8)]: 

0 1 *o0o'«>  • i ‘■J”' 

which  tends  to  0 as  q tends  to  1. 
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11.  Some  open  research  problems  for  extensions 

The  approach  described  In  this  paper  can  be  described  as  one  which 
formulates  statistical  estimation  and  testing  problems  as  problems  of  density 
estimation  and  testing  for  \dilte  noise.  This  paper  discussed  only  the  univar- 
iate one-sample  case.  Two-sample  and  multivariate  (including  non-par ame trie 
regression)  problems  can  be  treated  similarly  (see  Parzen  (1977)).  This 
section  describes  some  extensions  of  our  results  in  the  one-sample  case 
whose  theory  and  application  is  open  for  research. 

Power  Transformation  to  Normality.  The  transformation  of  a random 
2 

variable  X to  a N(y,CT  ) distribution  is  often  assumed  to  be  of  the  form 

'P(x)  - ^ {(x  - S)^  - 1>  . X ,t  0 

» log  (x  - ?)  , X - 0 . 

The  derivative  i{>(x)  ~ ' (x)  has  a single  formula 

i|»(x)  - (X  - C)^"^  . 

The  quantile  function  Q(u)  of  X is  then  related  to  the  standard 
normal  quantile  function  4 ^(u)  by 

y + a*“^(u)  - ^ {(Q(u)  - c)^  - 1}  , X j*  0 


log  (Q(u)  - O 


9 


X ■ 0 
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» 


The  density-quantile  function  of  X satisfies 

log  fQ(u)  - -log  o + log  4)*’‘^(u)  + (X  - 1)  log  (Q(u)  - O 

The  problem  Is:  (1)  to  use  these  relations  to  estimate  the  parameters  X 
and  Z i and  (2)  compare  these  estimators  with  the  estimators  of  Box  and 
Cox  (1964). 

Survival  data.  Let  X^,...,X^  be  a random  sample  from  a single 
lifetime  or  survival  distribution  F with  quantile  function  Q . However 
one  may  fall  to  observe  an  X (called  a "death")  due  to  the  previous  occur- 
rence of  some  other  event  Y (called  a "loss")  which  has  distribution  H . 
The  desired  value  X Is  censored  on  the  right  by  Y , and  one  observes 

Z • min  (X,Y) 

with  distribution  function  G satisfying 

1 - G - (1  - F)(l  - H) 

under  suitable  Independence  assumptions. 

From  the  observed  data  Z, ,...,Z  one  can  form  an  estimator  F of 

1'  ’ n 

F Introduced  by  Kaplan  and  Meier  (1958).  Its  quantile  function  Q Is  an 
estlsMtor  of  Q . The  asymptotic  distribution  theory  of  F and  Q has 
been  found  by  Breslow  and  Crowley  (1974)  and  Sanders  (1975)  respectively; 

Che  latter  shows  that  fQ(u)  {Q(u)  - Q(u)}  , 0 < u < 1 , converges  In 
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distribution  Cas  a stochastic  process)  to  a zero  mean  Gaussian  process  with 
covariance  kernel  K given  by 

«lnCu,,U2) 

KCui.Uj)  "Cl  - Uj^Xl  - Uj)  / dw  (1  - w)  ^ {1  “ HQ(w}} 


When  there  Is  no  censoring,  H ■ 0 and  K(uj^,U2)  " ~ *^2^  ^ **2  * 

the  covariance  kernel  of  the  Brownian  bridge. 

The  covariance  kernel  K has  an  integral  representation  which  makes 
It  easy  to  find  its  RKHS  inner  product.  Thus  one  would  have  no  difficulty 
extending  the  results  of  Section  10  to  estimation  of  location  and  scale 
parameters  from  survival  data. 


Sampling  the  Quantile  Process.  Suppose  that  to  compress  the  data  one 
seeks  to  reduce  a sample  of  size  n to  k values,  namely  the  order  statls- 
*(np.  j “ Q(Pj)  corresponding  to  specified  percentiles  Pj^,...,Pj^  • 

One  can  choose  these  percentiles  so  that  the  optimal  linear  estimators 

A A 

U and  O that  could  be  formed  from  them  have  variances  which  are  a minimum 
over  all  choices  of  k points  at  which  to  sample  Q(u)  . Results  of  this 
kind  could  be  deduced  from  the  work  of  Sacks  and  Ylvlsaker  (1966)  on  designs 
of  continuous  parameter  time  series  regression  problems. 
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